I claim: 

1. A method for time scaling and/or pitch shifting an audio signal, comprising 
analyzing said audio signal using multiple psychoacoustic criteria to identify a 

region of the audio signal in which the omission of a portion of the audio signal or the 
repetition of a portion of the audio signal is inaudible or minimally audible, 

selecting a splice point in said region of the audio signal, 

deleting a portion of the audio signal beginning at the splice point or repeating a 
portion of the audio signal ending at the splice point, and 

reading out the resulting audio signal at a rate that yields a desired audio signal 
time duration and a desired time scaling and/or pitch shifting. 

2. A method for time scaling and/or pitch shifting an audio signal represented by 
samples, comprising 

analyzing said audio signal using multiple psychoacoustic criteria to identify a 
region of the audio signal in which the omission of a portion of the audio signal or the 
repetition of a portion of the audio signal is inaudible or minimally audible, 

selecting a splice point in said region of the audio signal, thereby defining a 
leading segment of the audio signal that leads the splice point, 

selecting an end point spaced from said splice point, thereby defining a trailing 
segment of the audio signal that trails the endpoint, and a target segment of the audio 
signal between the splice and end points, 

joining said leading and trailing segments at said splice point, thereby decreasing 
the number of audio signal samples by omitting the target segment when the end point 
has a higher sample number than said splice point, or increasing the number of samples 
by repeating the target segment when the end point has a lower sample number than said 
splice point, and 

reading out the joined leading and trailing segments at a rate that yields a desired 
audio signal time duration and a desired time scaling and/or pitch shifting. 



3. The method of claim 2 wherein: 

a time duration the same as the original time duration results in pitch shifting the 
audio signal, 

a time duration decreased by the same proportion as the relative change in the 
reduction in the number of samples, in the case of omitting the target segment, results in 
time compressing the audio signal, 

a time duration increased by the same proportion as the relative change in the 
increase in the number of samples, in the case of repeating the target segment, results in 
time expanding the audio signal, 

a time duration decreased by a proportion different from the relative change in the 
reduction in the number of samples results in time compressing and pitch shifting the 
audio signal, and 

a time duration increased by a proportion different from the relative change in the 
increase in the number of samples results in time expansion and pitch shifting the audio 
signal 

4. The method of claim 2 wherein the end point is also selected to be in said 

region, 

5. The method of claim 2 wherein analyzing said audio signal using muhiple 
psychoacoustic criteria includes analyzing said audio signal to identify a region of the 
audio signal in which the audio satisfies at least one criterion of a group of 
psychoacoustic criteria. 

6. The method of claim 5 wherein said psychoacoustic criteria include at least 
one of the following: 

the identified region of said audio signal is substantially premasked or 

postmasked as the result of a transient, 

the identified region of said audio signal is substantially inaudible, 
the identified region of said audio signal is predominantly at high 

frequencies, and 



the identified region of said audio signal is a quieter portion of a segment 
of the audio signal in which a portion or portions of the segment preceding and/or 
following the region is louder. 

7. The method of claim 6 wherein the criterion that the region of said audio is 
predominantly at high frequencies is based on frequencies above about 10 to 12 kHz. 

8. The method of claim 5 wherein said group of psychoacoustic criteria are 
arranged in order of the increasing audibility of artifacts resulting from the joining of the 
leading and trailing segments at said splice point. 

9. The method of claim 8 wherein said region is identified when the highest 
ranking psychoacoustic criterion, the criterion leading to the least audible artifacts in said 
group is satisfied. 

10. The method of claim 8 wherein the top-ranked psychoacoustic criterion is 
that the region of said audio signal is substantially premasked or masked as the result of a 
transient. 

11. The method of claim 8 wherein said psychoacoustic criteria include at least 
the following four criteria, ranked in the following order: 

the identified region of said audio signal is substantially premasked or 

postmasked as the result of a transient, 

the identified region of said audio signal is substantially inaudible, 
the identified region of said audio signal is predominantly at high 

frequencies, and 

the identified region of said audio signal is a quieter portion of a segment 
of the audio signal in which a portion or portions of the segment preceding and/or 
following the region is louder. 



12. The method of claim 1 1 wherein the criterion that the region of said audio is 
predominantly at high frequencies is based on frequencies above about 10 to 12 kHz. 

13. The method of claim 5 wherein the end point is also selected to be in said 

region. 

14. The method of any one of claims 2 wherein said step of joining said leading 
and trailing segments at the splice point includes crossfading the leading and trailing 
segments. 

15. The method of claim 14 wherein the crossfading is a linear crossfading. 

16. The method of claim 14 wherein the crossfading is a nonlinear crossfading. 

17. The method of claim 16 wherein the nonlinear crossfading is in accordance 
with a Hanning window. 

18. The method of claim 16 wherein the nonlinear crossfading is in accordance 
with a Kaiser-Bessel window. 

19. The method of claim 14 wherein the length of the crossfade resulting from 
crossfading the leading and trailing segments is variable. 

20. The method of claim 19 wherein the length of the crossfade resulting from 
crossfading the leading and trailing segments is selected to minimize audible splicing 
artifacts. 

21. The method of claim 19 wherein the crossfading is a linear crossfading. 

22. The method of claim 19 wherein the crossfading is a nonlinear crossfading. 



23. The method of claim 22 wherein the nonlinear crossfading is in accordance 
with a Banning window. 

24. The method of claim 22 wherein the nonlinear crossfading is in accordance 
with a Kaiser-Bessel window. 

25. The method of claim 2, wherein in the case of decreasing the number of audio 
signal samples by omitting the target segment, said end point is selected by 
autocorrelating a segment of audio trailing the splice point. 

26. The method of claim 25 wherein said end point is selected by autocorrelating 
a segment of audio trailing the sample point up to a maximum processing point and 
selecting an end point substantially at the point of maximum autocorrelation that is 
greater than a minimum processing point, 

27. The method of claim 26 wherein said maximum processing point is fixed 
relative to said splice point. 

28. The method of claim 26 wherein said minimum processing point is variable 
relative to said splice point. 

29. The method of claim 28 wherein said minimum processing point is variable 
relative to said splice point in response to signal characteristics near said splice point. 

30. The method of claim 25 wherein said autocorrelation is based on the time- 
domain characteristics of the audio signal. 

31. The method of claim 30 wherein the time domain characteristics of the audio 
signal are weighted according to human hearing sensitivity. 



32. The method of claim 25 wherein said autocorrelation is based on the phase 
characteristics of the audio signal. 

33. The method of claim 32 wherein said autocorrelation is based on the 
instantaneous phase characteristics of the audio signal. 

34. The method of claim 25 wherein said autocorrelation is based on the phase 
and time-domain characteristics of the audio signal. 

35. The method of claim 34 wherein the time domain characteristics of the audio 
signal are weighted according to human hearing sensitivity. 

36. The method of claim 34 wherein said autocorrelation is based on the 
instantaneous phase and time-domain characteristics of the audio signal. 

37. The method of claim 36 wherein the time domain characteristics of the audio 
signal are weighted according to human hearing sensitivity. 

38. The method of claim 2, wherein in the case of increasing the number of audio 
signal samples by repeating the target segment, said end point is selected by cross 
correlating segments of audio leading and trailing the splice point. 

39. The method of claim 38 wherein said end point is selected by cross 
correlating a segment of audio leading the splice point up to a first maximum processing 
point and a segment of audio trailing the splice point up to a second maximum processing 
point and selecting an end point in the segment leading the splice point substantially at 
the point of maximum cross correlation that is greater than a minimum processing point. 

40. The method of claim 39 wherein said first and second maximum processing 
points are fixed relative to said splice point. 



4L The method of claim 40 wherehi said first and second maximum processing 
points are the same relative to said splice point. 

42. The method of claim 39 wherein said minimum processing point is variable 
relative to said splice point. 

43. The method of claim 42 wherein said minimum processing point is variable 
relative to said splice point in response to signal characteristics near said splice point. 

44. The method of claim 38 wherein said cross correlation is based on the time- 
domain characteristics of the audio signal. 

45. The method of claim 44, wherein the time domain characteristics of the audio 
signal are weighted according to human hearing sensitivity. 

46. The method of claim 38 wherein said autocorrelation is based on the phase 
characteristics of the audio signal. 

47. The method of claim 46 wherein said autocorrelation is based on the 
instantaneous phase characteristics of the audio signal. 

48. The method of claim 38 wherein said autocorrelation is based on the phase 
and time-domain characteristics of the audio signal. 

49. The method of claim 48, wherein the time domain characteristics of the audio 
signal are weighted according to human hearing sensitivity, 

50. The method of claim 48 wherein said autocorrelation is based on the 
instantaneous phase and time-domain characteristics of the audio signal. 



51. The method of claim 50, wherein the time domain characteristics of the audio 
signal are weighted according to human hearing sensitivity. 

52. A method for time scaling and/or pitch shifting multiple channels of audio 
signals, comprising 

analyzing each of said audio signals using at least one psychoacoustic criterion to 
identify at least one region in each of the audio signals in which the omission of a portion 
of the audio signal or the repetition of a portion of the audio signal is inaudible or 
minimally audible, 

selecting a common splice point in one of said regions in each of the audio 
signals, wherein the splice points in the multiple channels of audio signals are selected to 
be substantially aligned with one another, 

deleting a portion of each audio signal beginning at the common splice point or 
repeating a portion of the audio signal ending at the common splice point, and 

reading out the resulting audio signals at a rate that yields a desired time duration 
for the multiple channels of audio and a desired time scaling and/or pitch shifting for the 
multiple channels of audio, 

53. A method for time scaling and/or pitch shifting multiple channels of audio 
signals, each signal represented by samples, comprising 

analyzing each of said audio signals using at least one psychoacoustic criterion to 
identify at least one region in each of the audio signals in which the omission of a portion 
of the audio signal or the repetition of a portion of the audio signal is inaudible or 
minimally audible, 

selecting a common splice point in one of said regions in each of the audio 
signals, thereby defining a leading segment of the audio signal that leads the splice point, 
wherein the splice points in the muhiple channels of audio signals are selected to be 
substantially aligned with one another, 

selecting an end point spaced from said splice point in each of the audio signals, 
thereby defining a trailing segment of the audio signal trailing the endpoint and a target 
segment of the audio signal between the splice and end points, wherein the end points in 



the multiple channels of audio signals are selected to be substantially aligned with one 
another, 

joining said leading and trailing segments at said splice point in each of the audio 
signals, thereby decreasing the number of audio signal samples by omitting the target 
segment when the end point has a higher sample number than said splice point, or 
increasing the number of samples by repeating the target segment when the end point has 
a lower sample number than said splice point, and 

reading out the joined leading and trailing segments in each of the audio signals at 
a rate that yields a desired time duration for the multiple channels of audio and a desired 
time scaling and/or pitch shifting for the multiple channels of audio. 

54. The method of claim 53, wherein: 

a time duration the same as the original time duration results in pitch shifting the 
audio signals, 

a time duration decreased by the same proportion as the relative change in the 
reduction in the number of samples, in the case of omitting the target segment, results in 
time compressing the audio signals, 

a time duration increased by the same proportion as the relative change in the 
increase in the number of samples, in the case of repeating the target segment, results in 
time expanding the audio signals, 

a time duration decreased by a proportion different from the relative change in the 
reduction in the number of samples results in time compressing and pitch shifting the 
audio signals, and 

a time duration increased by a proportion different from the relative change in the 
increase in the number of samples results in time expansion and pitch shifting the audio 
signals. 

55. The method of claim 53 wherein said selecting a common splice point selects 
a splice point in each of said audio signals, one or more of which splice points may not 
not be coincident with one or more other splice points, and selects one of said splice 
points as the common splice point. 



56. The method of claim 55 wherein said selecting a common splice point selects 
a common splice point from among said splice points using the psychoacoustic criteria 
employed to identify said regions. 

57. The method of claim 56 wherein said selecting a common splice point selects 
a common splice point from among said splice points by also taking into account cross- 
channel effects using at least one psychoacoustic criterion. 

58. The method of claim 53 wherein said selecting a common splice point 
identifies portions of said audio signals in which identified regions overlap, and selects a 
common splice point in the overiapping portion of said identified regions. 

59. The method of claim 58 wherein said selecting a common splice point selects 
a common splice point in the overlapping portion of said identified regions using the 
psychoacoustic criteria employed to identify said regions. 

60. The method of claim 58 wherein said selecting a common splice point selects 
a common splice point in the overlapping portion of said identified regions by also taking 
into account cross-channel effects using at least one psychoacoustic criterion . 

61. The method of claim 3 wherein said selecting a common splice point selects a 
common splice point using the psychoacoustic criteria employed to identify said regions. 

62. The method of claim 61 wherein said selecting a common splice point selects 
a common splice point by also taking into account cross-channel effects using at least one 
psychoacoustic criterion . 

63. The method of claim 53 wherein the end point is also selected to be in said 
region in each of the audio signals. 



64. The method of claim 53 wherein analyzing said audio signal using at least 
one psychoacoustic criterion to identify at least one region in each of the audio signals in 
which the omission of a portion of the audio signal or the repetition of a portion of the 
audio signal is inaudible or minimally audible includes analyzing said audio signal to 
identify a region of the audio signal in which the audio satisfies at least one criterion of a 
group of psychoacoustic criteria. 

65. The method of claim 64 wherein said psychoacoustic criteria include at least 
one of the following: 

the identified region of said audio signal is substantially premasked or 

postmasked as the result of a transient, 

the identified region of said audio signal is substantially inaudible, 
the identified region of said audio signal is predominantly at high 

frequencies, and 

the identified region of said audio signal is a quieter portion of a segment 
of the audio signal in which a portion or portions of the segment preceding and/or 
following the region is louder. 

66. The method of claim 65 wherein the criterion that the region of said audio is 
predominantly at high frequencies is based on frequencies above about 10 to 12 kHz. 

67. The method of claim 64 wherein said group of psychoacoustic criteria are 
arranged in order of the increasing audibility of artifacts resulting from the joining of the 
leading and trailing segments at said splice point. 

68. The method of claim 67 wherein said region is identified when the highest- 
ranking psychoacoustic, the criterion leading to the least audible artifacts in said group is 
satisfied. 



69. The method of claim 67 wherein the top-ranked psychoacoustic criterion is 
that the region of said audio signal is substantially premasked or masked as the result of a 
transient. 

70. The method of claim 67 wherein said psychoacoustic criteria include at least 
the following four criteria, ranked in the following order: 

the identified region of said audio signal is substantially premasked or 

postmasked as the result of a transient, 

the identified region of said audio signal is substantially inaudible, 
the identified region of said audio signal is predominantly at high 

frequencies, and 

the identified region of said audio signal is a quieter portion of a segment 
of the audio signal in which a portion or portions of the segment preceding and/or 
following the region is louder. 

71. The method of claim 70 wherein the criterion that the region of said audio is 
predominantly at high frequencies is based on frequencies above about 10 to 12 kHz. 

72. The method of claim 64 wherein the end point is also selected to be in said 

region. 



